Method and apparatus for performing biosequence similarity searching

ABSTRACT

A system and method for performing biological sequence similarity searching is disclosed. This includes a programmable logic device configured to include a pipeline that comprises a matching stage, the matching stage being configured to receive a data stream comprising a plurality of possible matches between a plurality of biological sequence data strings and a plurality of substrings of a query string. The pipeline may further include a ungapped extension prefilter stage located downstream from the matching stage, the prefilter stage being configured to shift through pattern matches between the biological sequence data strings and the plurality of substrings of a query string and provide a score so that only pattern matches that exceed a user defined score will pass downstream from the prefilter stage. The matching stage may include at least one Bloom filter.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims the benefit of U.S. Provisional Application No.60/658,418, filed on Mar. 3, 2005 and claims the benefit of U.S.Provisional Application No. 60/736,081, filed on Nov. 11, 2005.

FIELD OF THE INVENTION

The present invention relates to the field of biosequence similaritysearching. In particular, the present invention relates to the field ofsearching large databases of biological sequences for strings that matcha query sequence.

BACKGROUND OF THE INVENTION

The amount of biosequence data being produced each year is growingexponentially. Extracting useful information from this massive amount ofdata efficiently is becoming an increasingly difficult task. Thedatabases of genomic DNA and protein sequences are an essential resourcefor modem molecular biology. This is where a computational search ofthese databases can show that a DNA sequence acquired in the lab issimilar to other sequences of a known biological function, revealingboth its role in the cell and its history over evolutionary time. Adecade of improvement in DNA sequencing technology has drivenexponential growth of biosequence databases such as NCBI GenBank, whichhas doubled in size every twelve (12) to sixteen (16) months for thelast decade and now stands at over forty-five (45) billion characters.These technological gains have also generated more novel sequences,including entire mammalian genomes, which keep search engines very busy.

Examples of this type of searching can be found in a paper entitledBiosequence Similarity Search On The Mercury System, P. Krishnamurthy,J. Buhler, R. D. Chamberlain, M. A. Franklin, K. Gyang, and J.Lancaster, In Proceedings of the 15th IEEE International Conference onApplication-Specific Systems, Architectures, and Processors (ASAP04),Sep. 2004, Pages 365-375 (2004). Another example is a paper entitled:NCBI BLASTN STAGE 1 IN RECONFIGURABLE HARDWARE, by Kwame Gyang,Department of Computer Science and Engineering, Washington University,Aug. 2004, Technical Report WUCSE-2005-30. Another example is a paperentitled “BLASTN Redundancy Filter in Reprogrammable Hardware”, by C.Behrens, J. Lancaster, and B. Wun, Department of Computer Science andEngineering, Washington University, Final Project Submission, Fall 2003.These papers are each incorporated by reference, in their entirety.

There is a growing need for a fast and cost effective mechanism forextracting useful information from biosequence data.

SUMMARY OF INVENTION

In one aspect of this invention, a digital logic circuit for performingbiological sequence similarity searching is disclosed. This digitallogic circuit includes a programmable logic device configured to includea pipeline comprising a Bloom filter stage, the Bloom filter stage beingconfigured to receive a data stream comprising a plurality of biologicalsequence data strings and filter the biological sequence data stream toidentify a plurality of possible matches between the sequence datastrings and a plurality of substrings of a query string.

In another aspect of this invention, a system for performing seededalignment searching is disclosed. This system includes reconfigurablehardware configured to implement a seeded alignment searching pipeline,the pipeline comprising a Bloom filter stage, the Bloom filter stagebeing configured to receive a data stream, the data stream comprising aplurality of data strings and filter the data stream to identify aplurality of possible matches between the stream data strings and aplurality of substrings of a query string.

In still another aspect of this invention, a method for performingbiological sequence similarity searching is disclosed. This methodincludes processing a biological sequence data stream through a Bloomfilter stage implemented on a programmable logic device, the biologicalsequence data stream comprising a plurality of biological sequence datastrings, the Bloom filter stage comprising a Bloom filter that isprogrammed with a plurality of substrings of a query string andconfigured to identify a plurality of possible matches between thebiological sequence data strings and the query substrings.

In yet another aspect of this invention, a digital logic circuit forperforming biological sequence similarity searching is disclosed. Thisdigital logic circuit includes a programmable logic device configured toinclude a pipeline that comprises a matching stage, the matching stagebeing configured to receive a data stream comprising a plurality ofpossible matches between a plurality of biological sequence data stringsand a plurality of substrings of a query string and the pipeline furthercomprises a prefilter stage located downstream from the matching stage,the prefilter stage being configured to shift through pattern matchesbetween the biological sequence data strings and the plurality ofsubstrings of a query string and provide a score so that only patternmatches that exceed a user defined score will pass downstream from theprefilter stage.

In still yet another aspect of this invention, a method for performingbiological sequence similarity searching is disclosed. This methodincludes processing a biological sequence data stream with aprogrammable logic device configured to include a pipeline thatcomprises a matching stage, the matching stage being configured toreceive a data stream comprising a plurality of possible matches betweena plurality of biological sequence data strings and a plurality ofsubstrings of a query string and utilizing a prefilter stage locateddownstream from the matching stage in the pipeline, the prefilter stagebeing configured to shift through pattern matches between the biologicalsequence data strings and the plurality of substrings of a query stringand provide a score so that only pattern matches that exceed a userdefined score pass will downstream from the prefilter stage.

These are merely some of the innumerable aspects of the presentinvention and should not be deemed an all-inclusive listing of theinnumerable aspects associated with the present invention.

BRIEF DESCRIPTION OF DRAWINGS

For a better understanding of the present invention, reference may bemade to the accompanying drawings in which:

FIG. 1 provides a block diagram overview of a three (3) stage pipelinein accordance with a preferred, nonlimiting, illustrative embodiment ofthe present invention broken down into three (3) main stages;

FIG. 2 provides an exemplary block diagram of a Stage 1 pipeline, e.g.,BLASTN, in accordance with a preferred, nonlimiting, illustrativeembodiment of the present invention;

FIG. 3 is a general schematic of a hardware platform for a biologicaldata pipeline in accordance with a preferred, nonlimiting, illustrativeembodiment of the present invention;

FIG. 4 depicts a sample portion of a biological sequence data stream;

FIG. 5 depicts a query sequence can be broken down into multiple w-mersin connection with the database sequence shown in FIG. 3;

FIG. 6(a) depicts an exemplary Bloom filter stream in accordance with apreferred, nonlimiting, illustrative embodiment of the presentinvention;

FIG. 6(b) shows an arrangement of Bloom filters configured to share thesame dual port block random access memories (BRAMs) in accordance with apreferred, nonlimiting, illustrative embodiment of the presentinvention;

FIG. 6(c) depicts a Bloom filter stage that comprises sixteen (16) Bloomfilters formed from four (4) Bloom filter arrangements, as shown in FIG.6(b), in accordance with a preferred, nonlimiting, illustrativeembodiment of the present invention;

FIG. 7 is a schematic and flow chart of control logic that is eitherwithin or associated with a Bloom filter stage and readily configured toidentify the starting position within the database sequence of thedatabase w-mer that is a possible match with a query w-mer;

FIG. 8 illustrates an exemplary hashing stage and hash table inaccordance with a preferred, nonlimiting, illustrative embodiment of thepresent invention;

FIG. 9 depicts an exemplary data window associated with the analysis ofungapped extension with analysis in two directions within a predefinedwindow of data;

FIG. 10 depicts an exemplary data window associated with the analysis ofungapped extension with analysis in a single direction within apredefined data window in accordance with a preferred, nonlimiting,illustrative embodiment of the present invention;

FIG. 11 is a block diagram overview of the three (3) stage pipeline ofFIG. 1, wherein the first and second stages are each broken down intotwo (2) substages;

FIG. 11A is a block diagram overview of the three (3) stage pipeline ofan alternative embodiment of the present invention shown in FIG. 1,wherein only the first stage is broken down into two (2) substages;

FIG. 12 is an illustrative, but nonlimiting, listing of pseudocodeutilized in an ungapped extension algorithm in accordance with apreferred, nonlimiting, illustrative embodiment of the presentinvention;

FIG. 13 is an illustrative, but nonlimiting, block diagram of aprefilter stage for an ungapped extension in accordance with apreferred, nonlimiting, illustrative embodiment of the presentinvention;

FIG. 13A is an illustrative, but nonlimiting, block diagram of a windowlookup module of the prefilter stage for an ungapped extension, as shownin FIG. 13;

FIG. 14 is an illustrative, but nonlimiting, block diagram of a basecomparator, as shown in FIG. 13, in accordance with a preferred,nonlimiting, illustrative embodiment of the present invention;

FIG. 15 is an illustrative, but nonlimiting, block diagram of an arrayof scoring stages, as shown in FIG. 13, in accordance with a preferred,nonlimiting, illustrative embodiment of the present invention;

FIG. 16 is an illustrative, but nonlimiting, block diagram of a singletwo step scoring stage, as shown in FIG. 15, in accordance with apreferred, nonlimiting, illustrative embodiment of the presentinvention;

FIG. 17 is a graphical representation of throughput of an overallpipeline as a function of ungapped extension throughput for queries ofsizes 20 kbases and 25 kbases;

FIG. 18 is a graphical representation of speedup of an overall pipelineas a function of ungapped extension throughput for queries of sizes 20kbases and 25 kbases with prefiltering in accordance with a preferred,nonlimiting, illustrative embodiment of the present invention;

FIG. 19 is a table of sensitivity results for an illustrative, butnonlimiting, prefilter for the ungapped extension in accordance with apreferred, nonlimiting, illustrative embodiment of the presentinvention; and

FIG. 20 is a flow chart of the algorithm shown in FIG. 12 utilized in anungapped extension in accordance with a preferred, nonlimiting,illustrative embodiment of the present invention.

DETAILED DESCRIPTION OF THE INVENTION

In the following detailed description, numerous specific details are setforth in order to provide a thorough understanding of the invention.However, it will be understood by those skilled in the art that thepresent invention may be practiced without these specific details. Inother instances, well-known methods, procedures, and components have notbeen described in detail so as to obscure the present invention.

In an effort to improve upon the speed of biological similaritysearching, the inventors herein have developed a novel and unique methodand system whereby Stage 1 of the software pipeline known in the art asBLAST (Basic Local Alignment Search Tool), and more particularly theBLAST application BLASTN, is implemented via programmable logic device,e.g., reconfigurable hardware, such as, but not limited to, FPGAs (fieldprogrammable gate arrays). In addition, this present invention decidesin Stage 2 of the software pipeline, ungapped extension, whether eachw-mer emitted from word matching is worth inspecting by the morecomputationally intensive, gapped extension by shifting through patternmatches between query and database. It is important to identify anddiscard spurious w-mers as early as possible in the BLAST pipelinebecause later stages are increasingly more complex.

BLAST compares a query sequence to a biosequence database to find othersequences that differ from it by a small number of edits(single-character insertions, deletions, or substitutions). Becausedirect measurement of the edit distance between sequences iscomputationally expensive, BLAST uses a variety of heuristics toidentify small portions of a large database that are worth comparingcarefully to the biosequence query.

BLAST includes a pipeline of computations that filter a stream ofcharacters from a database to identify meaningful matches to a query. Tokeep pace with growing databases and queries, this stream must befiltered at increasingly higher rates. One path to higher performance isto develop a specialized processor that offloads part of BLAST'scomputation from a general-purpose processor, e.g., CPU. Illustrative,but nonlimiting, examples of processors that are known to accelerate orreplace BLAST include the ASIC-based Paracel GeneMatcher™ and theFPGA-based TimeLogic DecypherBLAST™ engine.

There is also a recently developed new design for an accelerator, i.e.,the FPGA-based MERCURY BLAST engine. This MERCURY BLAST engine utilizesfine-grained parallelism in BLAST's algorithms and the high input/output(I/O) bandwidth of current commodity computing systems to deliver aspeedup that is at least one (1) to two (2) orders of magnitude over thesoftware BLAST that can be deployed on a standard processor located in alaboratory.

The MERCURY BLAST engine is a multistage pipeline and this presentinvention involves word matching, i.e., a first stage (Stage 1), and anungapped extension stage, i.e., a second stage (Stage 2). The presentinvention includes a prefilter stage between the word matching stage andthe ungapped extension stage that sifts through pattern matches betweena query sequence and a biological database sequence and decides whetherto perform a more accurate, but computationally expensive, comparisonbetween them. The present invention also provides a more fruitfulacceleration of variable-length string matching that is robust tocharacter substitutions. The illustrative, but nonlimiting, preferredimplementation is compact, runs at high clock rates, and can process onepattern match for every clock cycle.

FIG. 1 shows a block diagram of a high level overview of a three (3)stage pipeline that is generally indicated by numeral 50. The databasesequences 52 are shown entering the pipeline 50. The first stage(Stage 1) 54 provides a word matching function that detects substringsof fixed length “w” in the stream that perfectly match a substring ofthe query, e.g., w=11, for DNA. These short matches are also known as“w-mers.” Each matching w-mer is forwarded to a second stage (Stage 2),which is an ungapped extension 56, which extends the w-mer on eitherside to identify a longer pair of sequences around the w-mer that matchwith at most a small number of mismatched characters. These longermatches are identified as high-scoring segment pairs (“HSP”s) orungapped alignments.

Finally, every high-scoring segment pair (HSP) that has both enoughmatches and sufficiently few mismatches is passed to the third stage(Stage 3), which is gapped extension 58, which preferably uses theSmith-Waterman dynamic programming algorithm to extend the high-scoringsegment pair (HSP) into a gapped alignment. Gapped alignment is a pairof similar regions that may differ by only a few arbitrary edits. Thesoftware, e.g., BLAST, reports only final gapped alignments with manymatches and few edits 60.

Research has determined that with the standard NCBI BLASTN softwareprogram, 83.9% of the computational time was spent in the first stage(Stage 1) 54 and 15.9% of the time was spent in the second stage (Stage2) 56. Therefore, to achieve a significant speedup, e.g., greater thansix (6) times, of BLASTN would require acceleration of both the firststage (Stage 1) 54 and the second stage (Stage 2) 56. NCBI is theNational Center for Biotechnology Information, which is amulti-disciplinary research group, composed of computer scientists,molecular biologists, mathematicians, biochemists, research physicians,and structural biologists concentrating on basic and applied research incomputational molecular biology utilizing a variety of softwaredatabases and tools.

A block diagram overview of the first stage (Stage 1) 54, referenced inFIG. 1, is shown in FIG. 2, which is a pipeline 100, e.g., BLASTN Stage1, in accordance with a preferred embodiment of the present invention.The input processing unit 104, the Bloom filter stage 106, the stringbuffer unit 108, the hashing stage 110, the hash (look-up) table 112,and the redundancy filter 114 are preferably implemented on aprogrammable logic device. An illustrative, but nonlimiting, example ofa programmable logic device 102 can include a field programmable gatearray (FPGA). A hash table 112 is preferably implemented on a memoryunit, e.g., static random access memory (SRAM) 118, which can beoff-chip from the programmable logic device 102.

The placement of the hash table 112 as being off-chip is optional and isonly a result of the memory constraints for field programmable gatearrays (FPGAs), wherein readily available FPGAs do not have sufficientcapacity to store the hash table and provide the functionality of thevarious pipeline stages. However, if capacity for a field programmablegate array (FPGA) sufficiently increases in the future, the hash table112 can also be implemented on an FPGA.

A preferred hardware platform for pipeline 100 is that disclosed in FIG.3 and is generally indicated by numeral 800. Additional details aboutthis hardware platform are disclosed in: (1) U.S. Pat. No. 6,711,558entitled “Associative Database Scanning and Information Retrieval”; (2)pending U.S. patent application Ser. No. 10/153,151, filed May 21, 2002and entitled “Associative Database Scanning and Information RetrievalUsing FPGA Devices”(published as U.S. patent application publication2003/0018630); (3) pending PCT application PCT/US04/16398 filed May 21,2004 and entitled “Intelligent Storage and Processing Using FPGADevices”; and (4) a paper on “Achieving Real Data Throughput for an FPGACo-Processor on Commodity Server Platforms” by Chamberlain et al.,published in Proc. of 1st Workshop on Building Block EngineArchitectures for Computers and Networks, Boston, Mass., Oct. 2004; anda paper on “Acceleration of Ungapped Extension in Mercury BLAST” byLancaster et al., was published on Nov. 12, 2005 for the Seventh(7^(th)) Workshop on Media and Streaming Processors held in conjunctionwith the Thirty-Eighth (38^(th)) International Symposium onMicroarchitecture (MICRO-38) in Barcelona, Spain, the entire disclosuresof all of which are incorporated herein by reference.

The input processing unit 104 is preferably configured to receive abiological sequence data stream 120 as an input from a sequencedatabase, which is shown in FIG. 2. This database stream 120 representsa database sequence that can be broken down into a plurality ofsubstrings, each substring having a base length “w.” Such substrings arereferred to herein as database w-mers. The input processing unit 104preferably outputs a parallel stream of q database w-mers (or sequencew-mers) 122i, wherein q represents the total number of database w-mersthat can be simultaneously processed by the Bloom filter stage 106.

A depiction of a sample portion of a biological sequence data stream isindicated by numeral 120, which is shown in FIG. 4. Stream 120 comprisesa sequence of bases (in this example, A, C, G, or T, which correspond tothe bases of a DNA sequence). In this example, each base can berepresented by 2 or 4 bits, as is known in the art. If w equals 11 and qequals 8, this stream 120 can be converted by the input processing unit104 into 8 parallel w-mers as shown in FIG. 5.

The Bloom filter stage 106, as shown in FIG. 2, operates to process eachdatabase w-mer to determine whether it is a possible match with a queryw-mer. A query w-mer being a substring of base length “w” derived from aquery sequence. By definition, Bloom filters will not produce any falsenegatives between the database w-mers and the query w-mers, but Bloomfilters are highly likely to produce some number of false positivematches between the database w-mers and the query w-mers. Because of thepresence of these false positives, the “yes”responses of the Bloomfilter to database w-mers are best referred to as possible matches.

Before processing database w-mers, the Bloom filters within Bloom filterstage 106 will need to be programmed/keyed with query w-mers from aquery sequence (or query string). The query sequence is some string ofbases for which a person wants to compare with the biological databasesequence to find matches. The query sequence can be broken down intomultiple w-mers as shown in FIG. 5 in connection with the databasesequence. For a given search, the value of w is the same for both thequery w-mers and database w-mers. Each Bloom filter within Bloom filterstage 106 is preferably programmed with all of the query w-mersdeveloped from the query sequence. In an illustrative, but nonlimiting,preferred embodiment, the Bloom filter stage 106 comprises a pluralityof parallel Bloom filters. Also, it is preferred, but not necessary,that the parallel Bloom filters are copies of one another. A Bloomfilter is said to be in parallel with another Bloom filter when both ofthose Bloom filters are configured to simultaneously process w-mers.Programming of the Bloom filter stage 106 can be achieved viaappropriate control commands asserted by the input processing unit 104(in response to control input to the pipeline 100).

A preferred Bloom filter architecture within the Bloom filter stage 106is shown in FIGS. 6(a)-6(c). FIG. 6(a) depicts an exemplary Bloom filter400. Within the Bloom filter 400, a database w-mer is processed by “k”hash functions 402, each hash function “i” operating to map the databasew-mer to a bit in the 1×m dual port block RAM (BRAM) unit 404i thatcorresponds to that hash function i. If all of the bits to which thatdatabase w-mer maps within BRAMs 4041 through 404 k are set (e.g., equalto 1), then logic unit 406, e.g., AND logic unit, will receive all 1'son its input lines, thereby producing a “yes” response as to whetherthat database w-mer is a possible match with a query w-mer.

The bits within the 1×m dual port BRAMs 404 are set if during the Bloomfilter programming process, a query w-mer maps to that bit location.Because the Bloom filter is programmed with all of the query w-mers fromthe query sequence, the output from logic unit 406, e.g., AND logicunit, will never produce a false negative, but may produce a falsepositive.

The hash functions 402 are preferably the same hash functions selectedfrom the H3 family of hash functions as used in the hashing stage 110.In the illustrative, but nonlimiting, example of FIG. 6(a), the 1×mmemory units 404 to which the hash functions 402 map can includedual-ported block random access memories (BRAMs). However, by virtue ofbeing dual ported, multiple Bloom filters can share access to the sameBRAM 404, thereby providing a substantial savings in the resourcesconsumed on the programmable logic device 102, e.g., FPGA, whenimplementing the Bloom filter stage 106.

A description of the operation of Bloom filters can be found in U.S.patent application Ser. No. 10/640,513, which was published as U.S.Patent Application No. 2005/0086520, and was filed on Aug. 14, 2003,which is incorporated herein by reference in its entirety.

As an additional realization of consumed resources, the dual port BRAMs404 are preferably double clocked to allow four Bloom filters 106 toshare the same dual port BRAM 404. Thus, during a given clock cycle, 4accesses can be made to each dual port BRAM 404i; thereby allowing eachdual port BRAM to support four Bloom filters 106 each clock cycle.

An arrangement 410 of four (4) Bloom filters 106 configured to share thesame k BRAMs 4041 through 404 k is shown in FIG. 6(b). Further still, ifit is feasible to triple clock the BRAMs 404, it is worth noting thateach dual port BRAM 404 could support six (6) Bloom filters.

A Bloom filter stage 106 that includes at least sixteen (16) Bloomfilters is shown in FIG. 6(c) formed from four (4) Bloom filterarrangements 410 that are shown in FIG. 6(b). Such a Bloom filter stage106 is capable of simultaneously processing a different database w-merthrough each parallel Bloom filter. As such, the Bloom filter stage 106can determine whether possible matches exist for multiple (sixteen (16)in a preferred, but nonlimiting, embodiment) database w-merssimultaneously. While the Bloom filter stage 106 comprises sixteen (16)parallel Bloom filters in a preferred embodiment, it should be notedthat more or less parallel Bloom filters can be used in the practice ofthe present invention, as would be understood by a person havingordinary skill in the art following the teachings presented herein.

The control logic that is either within Bloom filter stage 106 orassociated with the Bloom filter stage 106 can be readily configured toidentify the starting position within the database sequence of thedatabase w-mer that is a possible match with a query w-mer. Anillustrative, but nonlimiting, example of how this control logic couldbe configured with an illustrative flow chart diagram is shown in FIG. 7and is generally indicated by numeral 900.

The first step is for a counter to be updated each time a new clockcycle is received to reflect the new starting point within the databasesequence of w-mer for that clock cycle, as indicated by numeral 902.Depending on which logic unit 406i (or units), e.g., AND logic unit,finds a possible match 904, an appropriate starting position for apossibly matching w-mer can be recorded. If a possible match is found,then i can be recorded 906 and for each recorded value of i, return astarting point (SP) plus the value of i as the pointer for a newstarting position for a possible match in the database sequence 908. Theprocess then determines if there is a new clock cycle 910. If the answeris negative, then this determination will be repeated until there is apositive determination and the starting point (SP) can be increased by qindicated by process step 912, wherein q represents the total number ofdatabase w-mers that can be simultaneously processed by the Bloom filterstage 106. The process then returns to step 904 to determine if apossible match can be found.

If no match is found in process step 904, the process then determines ifthere is a new clock cycle 910. If the answer is negative, then thisdetermination will be repeated until there is a positive determinationand the starting point (SP) can be increased by q indicated by processstep 912, wherein q represents the total number of database w-mers thatcan be simultaneously processed by the Bloom filter stage 106. Theprocess then returns to step 904 to determine if a possible match can befound.

Each of these starting positions for a possible match is preferably thenpassed on to the string buffer unit 108, as shown in FIG. 2, as anoutput from the Bloom filter stage 106.

Optionally, an output 124i from the Bloom filter stage can comprise theactual database w-mer that triggered the possible match, either inaddition to or instead of the starting position pointer.

A preferred range of values for w is 5 to 15, with 11 being the mostpreferred value. However, it should be noted that values outside thisrange can also be used if desired by a practitioner of the invention. Apreferred value for q is 16. A preferred value for k is 6. Preferredvalues for m are 32 kbits or 64 kbits. However, other values for q, k,and m can be used if desired by a practitioner of the invention.Additional details about the preferred Bloom filter stage 106 as well asalternative embodiments are described in publications that werepreviously incorporated by reference.

The string buffer unit 108 operates to receive the parallel outputs 1241through 124q that comprise pointers that identify the startingposition(s) in the database sequence for each database w-mer that is apossible match with a query w-mer and/or the possibly matching databasew-mers themselves. String buffer unit 108 produces as an output 126 aserialized stream of pointers to these starting positions and/orpossibly matching database w-mers. Additional details about thepreferred string buffer unit 108 are described in publications that werepreviously incorporated by reference.

For each received database w-mer pointer or database w-mer within stream126, hashing stage 110 operates to confirm that a possible match existsbetween a database w-mer and a query w-mer and determine a pointer tothe query w-mer that may possibly match the database w-mer and is shownin FIG. 2. If stream 126 includes only pointers and not the databasew-mers themselves, then it is preferred that hashing stage 110 alsoreceive stream 120 so that the hashing stage can extract the appropriatepossibly matching database w-mers from stream 120 using the startingposition pointers in stream 126. Hash functions within the hash stageoperate to map each possibly matching database w-mer to a position inhash table 112.

An exemplary hashing stage 110 and hash table 112 in accordance with apreferred embodiment of the present invention is shown in FIG. 8. Thehash table 112 preferably comprises a primary table 602, a secondarytable 604, and a duplicate table 606. Stored at each address in primarytable 602 is preferably an entry that includes a pointer to the startingposition in the query sequence for the query w-mer that was mapped tothat address by the hash functions for hashing stage 110. For any queryw-mers that are duplicated at multiple locations within the querysequence, a duplicate bit in that query w-mer's corresponding entry inprimary table 602 is preferably set.

To build the hash table 112, hash functions are used to map each queryw-mer to an address in the hashing stage 110. The creation of these hashfunctions is preferably performed by software prior to streaming thedatabase w-mers through the pipeline 100. Once the hash functions arecreated by software, these hash functions are preferably loaded intohashing stage 110 during the programming process prior to streaming thedatabase w-mers through the pipeline. As new query biological sequencesare used to program the hashing stage 110, there is a displacement tableutilized by the hashing functions that is preferably updated to reflectthe new query w-mers. The hashing functions themselves can remain thesame. Thereafter, the hash table 112 can be programmed by processing thequery w-mers through the hashing stage 110 while an appropriate controlcommand is asserted. If desired, ordinary hash functions can be used tobuild the hash table 112 and process database w-mers in the hashingstage 110. However, because ordinary hashing typically produces multiplecollisions, and because the hash table 112 can be stored off theprogrammable logic device 102, e.g., FPGA, in memory 118, e.g., SRAM, itis desirable use hashing techniques other than ordinary hashing tominimize the number of memory look-ups, and thereby avoid a potentialpipeline bottleneck. In the hashing process, a collision occurs if whenbuilding a hash table 112, two different query w-mers map to the sameaddress in the primary table 602. With ordinary hashing, such collisionsare not uncommon. To resolve such collisions, a secondary table 604 ispreferably used that provides different addresses for the differentw-mers that mapped to the same address in the primary table 602, as iswell-known. A collision bit in the address of the primary table 602where the collision occurred is then set to identify the need to accessthe secondary table to properly map that w-mer.

A secondary set of hash functions are preferably used by the hashingstage 110 to map database w-mers to the secondary table 604. Thissecondary set of hash functions can be either ordinary hash functions,perfect hash functions, or near perfect hash functions, depending uponthe preferences of a practitioner of the present invention. In apreferred hashing operation, the hash functions that map database w-mersto the primary table 602 and the hash functions that map database w-mersto the secondary table 604 simultaneously process all received databasew-mers. If the address in the primary table 602 to which the databasew-mer maps has its collision bit set, then the hashing stage logic usesthe address in the secondary table 604 to which that database w-mermapped, thereby avoiding the need for the hashing stage to re-hash adatabase w-mer if that database w-mer maps to an address where acollision occurred.

In a first alternative embodiment that seeks to minimize the number ofhash table look-ups, perfect hashing functions are used to build thehash table 112 and perform the mapping of hashing stage 110. If perfecthashing is used to build hash table 112, then the secondary table 604 isnot needed. Perfect hashing functions are explained in greater detail insome of the publications incorporated by reference.

In a second alternative embodiment, there is a need to reduce the numberof hash table look-ups so that near perfect hashing functions are usedto build the hash table 112 and perform the mapping of hashing stage110. Near perfect hashing can be defined as hashing that operates toprovide a decreased likelihood of collisions relative to the likelihoodof collisions provided by ordinary hashing functions, and wherein thehashing approaches a perfect hash the longer that it executes. Nearperfect hashing functions are explained in greater detail in some of thepublications incorporated by reference. The preferred near perfecthashing functions are those in the family H3 chosen to be of full rank.The family of hashing functions that is designated as “H3” as well asthe concept of “full rank” are fully described and disclosed in numerouspublications in this area. More generally, the near-perfect hashingalgorithm described herein may be used to generate a mapping of a set ofbinary strings of common length equal to n, referred to as keys, toindices into a hash table, so that few or no pairs of keys map to thesame location in the table. The hashing logic of hashing stage 110 ispreferably implemented on the programmable logic device 102, e.g., FPGA,to map possibly matching database w-mers to their respective positionsin the query sequence.

If the hashing stage 110 maps a possibly matching database w-mer to anempty entry in the hash table 112, then that database w-mer can bediscarded as a false positive. If the hashing stage 110 maps a possiblymatching database w-mer to an entry in the hash table 112 that containsa pointer to a position in the query sequence, then neither theduplicate bit nor the collision bit will be set. Then that pointerpoints to the only starting position in the query sequence for the queryw-mer that is a possible match to that database w-mer.

If the hashing stage 110 maps a possibly matching database w-mer to anentry in the hash table 112 for which the collision bit is set, then thehashing stage 110 looks to the address in the secondary table 604 wherethe secondary hash function mapped that possibly matching databasew-mer. Once the proper entry in the secondary table 604 for the databasew-mer is identified, then the secondary table entry is processed in thesame manner as entries in the primary table (whose collision bits arenot set).

If the hashing stage 110 maps a possibly matching database w-mer to anentry in the hash table 112 that contains a pointer to a position in thequery sequence, and the duplicate bit is set, then that pointer pointsto one of a plurality of starting positions in the query sequence forthe query w-mer that is a possible match to that database w-mer. Toobtain the starting position(s) for each duplicate query w-mer, theduplicate table 606 is then preferably accessed. Also included in eachprimary table entry whose duplicate bit is set is a pointer to anaddress in the duplicate table 606 for that query w-mer. At thataddress, the duplicate table address preferably includes an entry thatidentifies the total number of duplicate query w-mers that exist in theduplicate table 606 for a given query w-mer. This total number ispreferably followed by a contiguous array comprising, for each w-merthat is a duplicate of the query w-mer at issue, a pointer to thestarting position in the query sequence for that duplicate query w-mer.Alternatively, linked list techniques can be used to chain the duplicatew-mer pointers together. Further still, the system can be configured tocancel a query sequence search if, during programming of the hash table110, the number of duplicate w-mers in the query sequence issufficiently large to exceed the capacity of the duplicate table 606.

Therefore, the hashing stage 110 operates to identify for each databasew-mer that is a possible match to a query w-mer, the starting positionin the query sequence for each query w-mer that is a possible matchthereto. The preferred output 128 from the hashing stage 110 is a streamof ordered pairs, where each ordered pair comprises a pointer to thestarting position in the database sequence of the database w-mer that isa possible match with a query w-mer and a pointer to a starting positionin the query sequence of the query w-mer that possibly matches thedatabase w-mer. If desired, comparison logic (not shown) can beoptionally implemented on the programmable logic device 102, e.g., FPGA,following the hashing stage that operates to compare the database w-merand query w-mer for each ordered pair to eliminate false positives.

It is worth noting that the role of the duplicate table 606 can beremoved from the hash table 112 and placed either in separate memory,e.g., SRAM, or on the programmable logic device 102, e.g., FPGA, as itsown pipeline stage downstream from the hashing stage 110 if desired by apractitioner of the invention.

The redundancy filter stage 114 is preferably configured to removeordered pairs from stream 128 that are deemed redundant with otherordered pairs within stream 128. The database-to-query w-mer matchesthat are deemed redundant and non-redundant on the basis of auser-specified redundancy threshold N as well as the preferred design ofa redundancy filter stage 114 is described in documents that werepreviously incorporated by reference. The output 130 from redundancyfilter stage 114 is preferably a stream of ordered pairs as with stream128, with the exception that the ordered pairs for matches that aredeemed redundant with other ordered pairs are not present therein.

Also, it is worth noting that the implementation on an FPGA of the FPGApipeline stages for the programmable logic device 102 described hereinis within the skill of a person having ordinary skill in the artfollowing the teachings herein and the teachings of (1) U.S. Pat. No.6,711,558 entitled “Associative Database Scanning and InformationRetrieval”; (2) pending U.S. patent application Ser. No. 10/153,151,filed May 21, 2002 and entitled “Associative Database Scanning andInformation Retrieval Using FPGA Devices” (published as U.S. patentapplication publication 2003/0018630); and (3) pending PCT applicationPCT/US04/16398 filed May 21, 2004 and entitled “Intelligent Storage andProcessing Using FPGA Devices”; the entire disclosures of all of whichhave been incorporated herein by reference. For example, see thedescription for FIG. 8 in PCT application PCT/US04/16398.

After the first stage (Stage 1) for word matching 54, being accelerated,as previously described and shown in FIG. 1, the performance of thesecond stage (Stage 2) that relates to the ungapped extension 56directly determines the performance of the overall pipelinedapplication.

The purpose of extending a w-mer is to determine, as quickly andaccurately as possible, whether the w-mer arose by chance alone, orwhether it may indicate a significant match. Ungapped extension in thesecond stage (Stage 2) 56 must decide whether each w-mer emitted fromword matching is worth inspecting by the more computationally intensive,gapped extension. It is important to identify and discard spuriousw-mers as early as possible in the BLAST pipeline because later stagesare increasingly more complex. There exists a delicate balance betweenthe stringency of the filter and its sensitivity, i.e. the number ofbiologically significant alignments that are found. A highly stringentfilter is needed to minimize time spent in fruitless gapped extension,but the filter must not discard w-mers that legitimately identify longquery-database matches with few differences. In the illustrative, butnonlimiting, embodiment of an implementation of a programmable logicdevice 102, e.g., FPGA, this filtering computation must also beparallelizable and simple enough to fit in a limited area.

The implementation of the second stage (Stage 2) 56 builds on theconcepts utilized in the implementation of the first stage (Stage 1) 54described above and disclosed in P. Krishnamurthy, J. Buhler, R. D.Chamberlain, M. A. Franklin, K. Gyang, and J. Lancaster, BiosequenceSimilarity Search On The Mercury System, In Proceedings of the 15th IEEEInternational Conference on Application-Specific Systems, Architectures,and Processors (ASAP04), Pages 365-375 (2004), which is incorporatedherein by reference in its entirety.

The second stage (Stage 2) 56 deploys an ungapped extension stageutilizing a programmable logic device, e.g., FPGA 74, which operates asa prefilter stage. This design utilizes the speed of the implementationof the programmable logic device, e.g., FPGA 74, to greatly reduce thenumber of w-mers passed to software while retaining the flexibility ofthe software implementation on those w-mers that pass in the pipeline70.

Preferably, but not necessarily, a w-mer must pass the second stage(Stage 2) 56 ungapped extension, which may utilize both hardware andsoftware, before being released to the third stage (Stage 3) gappedextension 58. By focusing on data streaming nature of the applicationwith overall throughput and not latency, the addition of anotherprocessing stage is very useful.

Utilizing a first illustrative software program, e.g., NCBI BLASTN, thesecond stage (Stage 2) 56 is now described. The ungapped extension of aw-mer into an HSP runs in two steps utilizing NCBI BLASTN. The w-mer isextended back toward the beginnings of the two sequences, then forwardtoward their ends. As the HSP extends over each character pair, thatpair receives a reward +α if the characters match or a penalty −β ifthey mismatch. An HSP's score is the sum of these rewards and penaltiesover all its pairs. The end of the HSP in each direction is chosen tomaximize the total score of that direction's extension. If the final HSPscores above a user-defined threshold, it is passed on to the thirdstage (Stage 3), gapped extension 58.

For long biological sequences, it is useful to terminate ungappedextension before reaching the ends of the sequences, especially if nohigh scoring HSP is likely to be found. In the illustrative software,BLASTN, implements early termination by an X-drop mechanism. Thealgorithm tracks the highest score achieved by any extension of thew-mer thus far; if the current extension scores at least X below thismaximum, further extension in that direction is terminated. Ungappedextension with X-dropping allows the BLASTN software to recover HSPs ofarbitrary length while limiting the average search space for a givenw-mer. However, because the regions of extension can in principle bequite long, this heuristic is not as suitable for fast implementation ina typical programmable logic device 74, e.g., FPGA. Note that eventhough extension in both directions can be done in parallel, this wasnot sufficient to achieve a desired speedup on processing.

The prior processing of the second stage (Stage 2) 56 utilizing ungappedextension is shown in FIG. 9 utilizing NCBI BLAST. The parameters usedhere are w=5, α=1, β=−3, and X-drop=10. W is the length of a substring,a is a positive reward for character match, β is a negative penalty forcharacter mismatch and X-drop is the reduction below the maximum levelof current extension scores where further extension is terminated. Theuse of X-dropping allows the software to recover HSPs of an arbitrarylength while limiting the average search space for a given w-mer but itis still not very fast or efficient.

The ungapped extension analysis with a preferred software program, i.e.,NCBI BLAST, is generally indicated by numeral 62 and begins at the endof the w-mer and extends left. The extension stops when the runningscore drops 10 below the maximum score (as indicated by the arrows).This is also indicated as the user modifiable “X-drop” parameter. Thesame computation is then performed in the other direction. The finalsubstring is the concatenation of the best substrings from the left andright extensions.

In the present invention, a prefilter is utilized to improve processingefficiency. The extension algorithm for a given w-mer is performed in asingle forward pass over a fixed size window. This extension begins bycalculating the limits of a fixed window of length L_(w), which iscentered on the w-mer in both the query and database stream. Afterward,the appropriate substrings of the query and the database stream are thenfetched into buffers. Once the substrings are buffered, then theextension algorithm begins. The extension algorithm in pseudo code isshown on FIG. 12 and generally indicated by numeral 66 and theassociated flow chart is shown on FIG. 20 and is generally indicated bynumeral 500. This algorithm 66 also lends itself to hardwareimplementation despite the sequential expression of the computation.

This prefilter extension algorithm implements a dynamic programmingrecurrence that simultaneously computes the start and end of the besthigh-scoring segment pair (HSP) in a predetermined window. The firststep 502 is to calculate predetermined window boundaries. The currentposition within a window is indicated by a variable “i ”. There is thena reset of the score “γ” of the best high-scoring segment pair (HSP)that terminates at position i of the window and the score “Γ” of thebest high-scoring segment pair (HSP) ending at or before i to zero (0).This is followed by a reset of the variable “B” as well as the twoendpoints Bmax and Emax of the best high scoring segment pair (HSP)ending at or before i to the value of zero (0).

First, for every position i in a window up to the length of the windowL_(w), a determination is made as to whether characters in the windowmatch 504 and the score contribution of each character pair in thewindow is computed, using a reward +α, indicated by numeral 506 and apenalty −β, indicated by numeral 508 as the implementation for providingrewards for pattern matches and penalties for pattern mismatches,respectively. These contributions can be calculated independently inparallel for each pair. If there is a reward +α, it is added to thescore γ, i.e., γ=γ+α. and if there is a penalty −β, it is subtractedfrom the score, i.e., γ=γ−β.

Then, the algorithm determines if the score y of the best high-scoringsegment pair (HSP) is greater than zero (0) 510. If this determinationis positive, then there is a determination if the score “γ” of the besthigh-scoring segment pair (HSP) that terminates at position i of thewindow is greater than the score “Γ” of the best high-scoring segmentpair (HSP) ending at or before i and if i (position in the window) isgreater than the end of a short word match w-mer (WmerEnd) 511.

If this determination in step 510 is negative, then there is adetermination of whether i is less than the start of the short wordmatch w-mer (WmerStart) 512. If i is less than the start of the shortword match w-mer, then the variable B is set to the current positionwithin a window i incremented by one (1) and the score “γ” of the besthigh-scoring segment pair (HSP) is set to (0) zero 516. The next step isthen a determination if i is equal to the length of the window L_(w)519. If the determination in step 511 is positive, then the score “F” ofthe best high-scoring segment pair (HSP) ending at or before i is set tothe score “γ” of the best high-scoring segment pair (HSP) thatterminates at i, the endpoint B_(max) of the best high-scoring segmentpair (HSP) “Γ”ending at or before i is set to the variable B and theendpoint Emax of the best high-scoring segment pair (HSP) is set to i(position in the window) 514. The next step is then a determination if iis equal to the length of the window L_(w) 519. Moreover, if thedetermination in step 511 is negative, then there is also adetermination if i is equal to the length of the window L_(w) 519.

If the determination in step 519 is negative, then the variable i isincremented by one (1) in step 517 and the process is returned to step504. If this determination in step 519 is positive, then the softwareprogram proceeds to the next program step 518, which is a determinationwhether the score of the best high-scoring segment pair (HSP) “Γ” isgreater than the variable T in step 518. If this determination ispositive, then a value of “true” is returned 520 and the softwareprogram ends 527.

If the determination in step 518 is negative, this is followed by adetermination of whether the position of the endpoint Bmax of the besthigh-scoring segment pair (HSP) is equal to zero 522. If thisdetermination is positive, then a value of “true” is returned 526 andthe software program ends 527.

Finally, if the determination in step 522 is negative, there is adetermination of whether the endpoint E_(max) of the best high-scoringsegment pair (HSP) is equal to the length of the window L_(w) 524. Ifthis determination is positive, then a value of “true” is returned 526and the algorithm ends 527 and if this determination is negative, then avalue of “false” is returned 528 and the algorithm will again end 527.

In summary, the algorithm tracks Γ, which is the score of the highestscoring HSP ending before i, along with its endpoints B_(max) andE_(max). Therefore, if Γ_(Lw) is greater than a user-defined thresholdscore, the w-mer passes the prefilter and is forwarded to (Stage 2b) 84for the ungapped extension.

There are two aspects of this prefilter algorithm. The first aspect isthat the recurrence requires that the HSP found by the algorithm passthrough its original matching w-mer and that a higher-scoring HSP in thewindow that does not contain this w-mer is ignored. This constraintensures that, if two distinct biological features appear in a singlewindow, the w-mers generated from each have a chance to generate twoindependent HSPs. Otherwise, both w-mers might identify only the featurewith the higher-scoring HSP, causing the other feature to be ignored.

The second aspect is if the highest scoring HSP intersects the bounds ofthe window, it is passed on to software regardless of the score. Thisheuristic ensures that HSPs that might extend well beyond the windowboundaries are properly found by software, which has no fixed sizewindow limits, rather than being prematurely eliminated.

An illustrative example of the ungapped extension prefilter utilizingthe illustrative Mercury BLAST is shown in FIG. 10. Using the sameparameters of FIG. 9 with the addition of a new parameter L_(w), whichis the length of a predetermined window, then L_(w)=19, w=5, α=1, β=−3,the present invention, e.g., Mercury BLAST, ungapped extension, asgenerally indicated by numeral 65, begins at the leftmost base of thewindow (indicated by brackets) and moves right, calculating thebest-scoring substring in the window. In this example, the algorithmsgave the same result in both FIGS. 9 and 10. However, this is notnecessarily the situation in general.

An illustrative system of the present invention is shown in schematicform as the application pipeline and is generally indicated by numeral70 is shown in FIG. 11. This system is preferably an infrastructuredesigned to accelerate disk-based computations and exploits the highinput/output (I/O) bandwidth available from modern disks by streamingdata 52 directly from the disk medium to programmable logic devices suchas, but not limited to, FPGAs (field programmable gate arrays) 74.

The streaming data is delivered to the hardware word matching prefilter76, which is designated as Stage 1a. After the data stream isprefiltered, it passes into a word matching module 80, as previouslydescribed above and is designated as Stage 1b. The word matching module80 utilizes memory 78, e.g., SRAM (static random access memory thatretains data bits in its memory as long as power is being supplied anddoes not have to be periodically refreshed). The combination of Stage 1aand 1b, 76 and 80 form the previously described first stage (Stage 1)54, that was previously shown in FIG. 1.

The data then passes into an ungapped extension prefilter 82 whichincludes dedicated hardware and/or firmware, which is designated asStage 2a. The output of the prefilter then passes into the remainder ofStage 2, which is the ungapped extension 84, which is designated asStage 2b and includes software. The combination of Stage 2a and 2b, 82and 84 form the previously described second stage (Stage 2) 56,previously shown in FIG. 1.

In the alternative, if there are enough hardware/firmware resourcesavailable, the software ungapped extension 84, which is designated asStage 2b, can be eliminated with the hardware/firmware ungappedextension prefilter 82 functioning as the sole ungapped extension, whichis designated as Stage 2and shown in FIG. 11A. This methodology canprovide higher throughput since the software portion of the ungappedextension can be eliminated in the utilization of the biologicalsequence similarity search computation, e.g., BLAST, search.

The data stream finally passes into the gapped extension 58, which isdesignated as Stage 3. The ungapped extension and gapped extension canbe operated by a wide variety of computing mechanisms, including, butnot limited to, a processor 88. The prefilter (Stage 2a) 82 lends itselfto hardware implementation despite the sequential expression of thecomputation in FIG. 12.

The ungapped extension prefilter design 82 is fully pipelined internallyand accepts one match per clock. The prefilter is parameterizable withthe parameters chosen based on a number of design-specific constraints.The commands are supported to configure parameters such as a matchscore, a mismatch score, and a cutoff threshold. Therefore, there is atrade-off between sensitivity and throughput that is left to thediscretion of the user.

Since the w-mer matching stage generates more output than input, twoindependent data paths are used for input into the ungapped extension,i.e., Stage 2a, which is generally indicated by numeral 82 and is shownin FIG. 13. The w-mers/commands and the data are parsed with thew-mers/commands are received on one path 92, and the data from thedatabase is received on the other path 93. The commands are supported byan active control valid signal and the commands are also transmitted ina predetermined format. Also, the commands are designed so that theprogrammable logic device 74, e.g., FPGA, shown in FIG. 11, does notneed to be reconfigured.

The module is organized into three (3) pipelined stages. This includesan extension controller 94, a window lookup module 95 and a scoringmodule 98. There is an extension controller 94 that parses the input todemultiplex the shared w-mers command 92 and database stream 93. Allw-mer matches and the database stream flow through the extensioncontroller 94 into the window lookup module 95. The window lookup module95 is the module that is responsible for fetching the appropriatesubstrings of the database stream and the query to form an alignmentwindow.

The window lookup module 95 is illustrated in additional structuraldetail in FIG. 13A. Preferably, but not necessarily, a query is storedon-chip utilizing dual-ported random access memory (RAM) on theprogrammable logic device 74, e.g., FPGA, shown in FIG. 11. The query isstreamed 202 at the beginning of each biological sequence similaritysearch computation, e.g., BLAST search, and is fixed until the end ofthe database is reached. The size of the query stream 202 is limited tothe amount of memory, e.g., block random access memories (RAMs) that areprovided in the query buffer 204. The database is also streamed 206 in asimilar manner as the query stream 202 and provided to a database buffer208. The database buffer 208 is preferably, but not necessarily, acircular buffer. The use of a circular buffer is helpful to retain anecessary section of the database stream 206 that any windows that areformed from the arriving w-mers might require. Since the w-mergeneration is performed in the first stage 54, as shown in FIG. 1, onlya small portion of the database stream 206 needs to be buffered in thedatabase buffer 208 in order to accommodate each extension request.

The database buffer 208 is preferably, but not necessarily, built toallow a fixed distance of w-mers that are out-of-order to be processedcorrectly. For the previously referenced BLASTP implementation, this isimportant since the w-mers in the word matching stage 80, as shown inFIG. 11, will possibly be out-of-order.

The window lookup module 95 preferably, but not necessarily, isorganized as a six (6) stage pipeline that is shown in FIG. 13A. Thisincludes a first stage 210, a second stage 212, a third stage 214, afourth stage 216, a fifth stage 218 and a sixth stage 220. Extensivepipelining may be necessary to keep up with requests from the previousstage, i.e., extension controller 94 shown in FIG. 13, which may comeonce per clock pulse.

The first stage of the pipeline 210 calculates the beginning of thequery and database windows based on the incoming w-mer and aconfigurable window size. The offset of the query 222 and the offset ofthe database 224 is provided to the first stage 210. This first stage ofthe pipeline 210 calculates the beginning of the query window anddatabase windows based on the incoming w-mer and a configurable windowsize. The offset of the beginning of the window is then passed to thebuffer modules, i.e., the second stage 212, the third stage 214, thefourth stage 216 as well as the fifth stage 218, which begins the taskof reading a superset of the window from memory, e.g., block RAMs. Anextra word of data must be retrieved from the memory, e.g., block RAMs,because there is no guarantee that the window boundaries will fall on aword boundary. Therefore, an extra word is fetched on each lookup sothat the exact window can be constructed from a temporary buffer holdinga window size worth of data plus the extra word. The next four (4)stages, i.e., the second stage 212, the third stage 214, the fourthstage 216 and the fifth stage 218, move the input data lock-step withthe buffer lookup process and ensure that pipeline stalls are handled ina correct manner. Finally, the superset of the query and databasewindows arrive to the final stage or sixth stage for format output 220.The output includes a seed out 226, a query window 228 and a databasewindow 230 so that the correct window of the buffers is extracted andregistered as the output.

In an illustrative, but nonlimiting, embodiment, the query is bufferedon-chip using the dual-ported block random access memory (BRAM) on toprogrammable logic devices such as, but not limited to, FPGAs (fieldprogrammable gate arrays) 74, as shown in FIG. 11. Preferably, but notnecessarily, the database stream is preferably buffered in a circularbuffer, which can be created from BRAMs. As previously mentioned, thew-mer generation is done in the first stage for the extension controller94. Only a small amount of the database stream 93 needs to be bufferedto accommodate each input w-mer because they arrive from Stage 1a, 76,and Stage 1b, 80, in order with respect to the database stream 93.

Since the illustrative, but nonlimiting, dual-ported block random accessmemories (BRAMs) are a highly-utilized resource in Stage 1a, 76, andStage 1b, 80, the memories, e.g., dual-ported block random accessmemories BRAMs, are time-multiplexed to create a quad ported BRAMstructure. After the window is fetched, it is passed into the scoringmodule 98 and stored in registers. The scoring module 98 implements therecurrence of the extension algorithm 66 on FIG. 12. Since thecomputation is too complex to be done in a single cycle, the scorer isextensively pipelined.

The first stage of the scoring pipeline 98 is shown in FIG. 13. The basecomparator 96 receives every base pair in parallel registers as shown bynumerals 302 and 304, respectively as shown in FIGS. 13 and 14. Thesepairs are fed into a plurality of comparators 306 that assign acomparison score to each base pair in the window. In the illustrativebut preferred embodiment, for DNA sequences, e.g., BLASTN, the basecomparator 96 assigns a reward +α, generally indicated by numeral 310,to each matching base pair and a penalty −β, generally indicated bynumeral 312 to each mismatching pair. The score computation is the samefor protein sequence analysis, e.g., BLASTP, except there are many morechoices for the score. In BLASTP, the reward +α and the penalty −β arereplaced with a value from a lookup table that is indexed by theconcatenation of the two symbols 302, 304. The calculation of all of thecomparison scores is performed in a single cycle using L_(w)comparators. After the scores are calculated, the calculated scores aredelivered to later scoring stages 309.

The scoring module 98 is preferably, but not necessarily, arranged as aclassic systolic array. The data from the previous stage are read oneach clock pulse and results are output to the following stage on thenext clock pulse. Storage for comparison scores in successive pipelinestages 97 decrease in every successive stage and is shown in FIG. 13.This decrease is possible because the comparison score for windowposition “i” is consumed in the ith pipeline stage and may then bediscarded, since later stages inspect only window positions that aregreater than i.

This structure of data movement is shown in more detail in FIG. 15,which is generally indicated by numeral 97 as a systolic array ofscoring stages. The darkened registers 326 hold the necessary comparisonscores for the w-mer being processed in each pipeline stage. AlthoughFIG. 15 shows a single comparison score being dropped for each scorerstage for ease of explanation, the preferred embodiment includesdropping two comparison scores per stage. There is also a plurality ofpipeline scoring stages 330. The data flows from left to right on FIG.15. The light registers 328 are pipeline calculation registers and areused to transfer the state of the recurrence from a previous scoringstage to the next with each column of registers containing a differentw-mer in the pipeline 70. Initialization 322 is provided to the lightregisters 328, dark registers 326 and scoring stages 330.

The interface of an individual scoring stage is generally indicated bynumeral 330 and is shown in FIG. 16. The values shown entering the topof the scoring stage 330 are the state of dynamic programming recurrencepropagated from the previous scoring stage. These values are read asinput to a first register 342 and the results are stored in the outputregisters 364 shown on the right. Each scoring stage 362 in the pipeline97 contains combinational logic to implement the dynamic programmingrecurrence shown in Lines 12-19 of the algorithm described in FIG. 12.This includes a value for γ indicated by numeral 344, a value for Bindicated by numeral 346, a value for Γ indicated by numeral 348, amaximum value of B, i.e., B_(max), indicated by numeral 350 and amaximum value of E, i.e., E_(max), indicated by numeral 352.

The data entering from the left of the second register 360 are thecomparison scores 358, i.e., α/β, and the database offset 354 and queryoffset 356 for a given w-mer, which are independent of the state of therecurrence. In order to sustain a high clock frequency design, eachscoring stage computes only two iterations of the loop per clock cycle,resulting in L_(w)/2 scoring stages for a complete calculation. Thisprovides step 1, indicated by numeral 361 and step 2, indicated bynumeral 363, which implements two iterations of Lines 12 through 19 ofthe algorithm 66 shown in FIG. 12. This is merely an illustrative, butnonlimiting, number of loop iterations within a single stage since anynumber of loop iterations can be utilized. This tradeoff is preferredwhen contrasting area and speed utilizing the preferred andillustrative, but nonlimiting, hardware. Therefore, there are L_(w)/2independent w-mers being processed simultaneously in the scoring stages362 of the processor when the pipeline 70 is full.

The pipeline calculation registers output 364 that are provided tosubsequent scoring stages, e.g., i +1. This includes a value for yindicated by numeral 366, a value for B indicated by numeral 368, avalue for Γ indicated by numeral 370, a maximum value of B, i.e.,B_(max), indicated by numeral 372 and a maximum value of E, i.e.,E_(max), indicated by numeral 374.

The final pipeline stage of the scoring module is the thresholdcomparator 99 that is shown in FIG. 13. The comparator takes thefully-scored segment and makes a decision to discard or keep thesegment. This decision is based on the score of the alignment relativeto a user-defined threshold T, as well as the position of thehighest-scoring substring. If the maximum score is above the threshold,the segment is passed on. Additionally, if the maximal scoring substringintersects either boundary of the window, the segment is also passed on,regardless of the score. If neither condition holds, the substring of apredetermined length, i.e., segment, is discarded. The segments that arepassed on are indicated by numeral 100 on FIG. 13.

In an optional and illustrative implementation, the ungapped extensionprefilter 82, as shown in FIG. 13, utilizes approximately 38% of thelogic cells and 27 BRAMs on a reconfigurable hardware such as, but notlimited to, FPGAs (field programmable gate arrays) 74. An illustrative,but nonlimiting, example includes a Xilinx® Virtex-II 6000® series FPGA.Xilinx, Inc., is a Delaware Corporation, having a place of business at2100 Logic Drive, San Jose, Calif. 95124-3400. This includes theinfrastructure for moving data in and out of the reconfigurablehardware, e.g., FPGA, itself. The illustrative, but nonlimiting, currentdesign runs at 96 MHz, processing one w-mer per clock. The full MercuryBLASTN design utilizes approximately 65% of the logic cells and 134BRAMs.

There are many aspects to the performance of an ungapped extensionprefilter 82, as shown in FIG. 13. First is the individual stagethroughput. The ungapped extension stage (Stage 2a) 82 must run fastenough to not be a bottleneck in the overall pipeline 70. Second, theungapped extension stage (Stage 2a) 82 must effectively filter as manyw-mers as possible, since downstream stages are even morecomputationally expensive. Finally, the above must be achieved withoutinadvertently dropping a large percentage of the significant alignments(i.e., the false negative rate must be limited).

First, throughput performance for the ungapped extension stage alone isa function of data input rate. In an illustrative, but nonlimiting,example, the ungapped extension stage accepts one w-mer per clock andruns at 96 MHz. Hence the maximum throughput of the filter is one (1)input match/cycle times 96 MHz=96 Mmatches/second. This provides aspeedup of twenty-five (25) over the software ungapped extension 82executed on the previously described baseline system, as shown inFIG. 1. This analysis is based solely on modeling, based on assumedparameters and constraints, and actual results can vary.

FIG. 17 shows the application throughput (quantified by the ingest rateof the complete pipeline 70, in million bases per second) as a functionof the performance attainable in the second stage (Stage 2) 56, as shownin FIG. 1, which is quantified by the ingest rate of the second stage(Stage 2) 56 alone, in million matches per second). FIG. 17 includesresults for both 25,000-base double stranded queries and 20,000-basedouble stranded queries.

FIG. 18 plots the resulting speedup versus throughput (for both querysizes) for the preferred embodiment of the present invention with aprefiltering stage (Stage 2a) 82 as shown on FIG. 11 relative to thesoftware implementation. Once again, this analysis is based solely onmodeling, based on assumed parameters and constraints, and actualresults can vary. The software profiling shows, for 25,000-base queries,an average execution time for the second stage (Stage 2) 56 alone of0.265 μs/match. This corresponds to a throughput of 3.8 Mmatches/s,plotted towards the left in FIGS. 17 and 18. As the performance of thesecond stage (Stage 2) 56 is increased, the overall pipeline 70performance increases proportionately until the second stage (Stage 2)56 is no longer the bottleneck stage.

To explore the impact this performance in the second stage (Stage 2) 56has on the overall system, we return to the graphs of FIGS. 17 and 18.Above approximately 35 Mmatches/s, the overall system throughput is atits maximum rate of 1400 Mbases/s, with a speedup of forty-eight (48times over that of software, e.g., NCBI BLASTN, for a 25,000-base query)and a speedup of thirty-eight (38 times over that of software, e.g.,NCBI BLASTN, for a 20,000-base query).

There are two (2) parameters of ungapped extension that affect itssensitivity. The first parameter is the score threshold used, whichaffects the number of HSPs that are produced. The second parameter isthe length of the window, which can affect the number of false negativesthat are produced. The HSPs that have enough mismatches before thewindow boundary to be below the score threshold but have many matchesimmediately outside the boundary will be incorrectly rejected.

To evaluate the functional sensitivity of hardware ungapped extension82, measurements were performed using an instrumented version ofsoftware, e.g., NCBI BLASTN. A software emulator of the new ungappedextension algorithm was placed in front of the standard ungappedextension stage, e.g., NCBI. Then, the statistics were gathered whichshow how many w-mers arrived at the ungapped extension prefilter stage82, and how many passed.

These statistics were collected both for ungapped extension, the secondstage (Stage 2) 56, FIG. 1, alone, e.g., NCBI BLASTN, and with thehardware emulator in place. The dataset was generated from the human andmouse genomes. The queries were statistically significant samples ofvarious sizes (e.g., 10 kbase, 100 kbase, and 1 Mbase). The databasestream was the mouse genome with low-complexity and repetitive sequencesremoved.

FIG. 19 is a table, generally indicated by numeral 380, which summarizesthe results for a window size of 64 bases, which is the window size usedin the illustrative, but nonlimiting hardware implementation. A scorethreshold of twenty (20) corresponds to the default value in ungappedextension or second stage (Stage 2) 82, FIG. 11, e.g., NCBI BLASTN. Thereject fraction is the measured ratio of output HSPs over input w-mers.This value quantifies the effectiveness of the overall second stage(Stage 2) 82, FIG. 11, at filtering w-mers so they need not be processedin (Stage 2b) 84. The percent found is the percentage of gappedalignments present in the MERCURY BLAST's output relative to NCBI BLASTNoutput, as shown in FIG. 19.

Using a window length of 64 bases, the ungapped extension prefilter 82is able to filter out between 99.5% and 99.9% of all its input. Forinstance, a threshold of seventeen (17) missed HSPs provides a goodtradeoff between a high reject fraction while keeping the vast majority(99.95%) of the significant HSPs. This translates into five (5) missedHSPs out of 10,334 HSPs found by software system, e.g., NCBI BLAST.

Therefore, biosequence similarity searching can be acceleratedpractically by a pipeline 70 designed to filter high-speed streams ofcharacter data utilizing a performance-critical ungapped extension stage56. The present invention is a highly parallel and pipelinedimplementation yields results comparable to those obtained fromsoftware, such as BLASTN, while running twenty-five times faster andenabling the entire accelerator to run approximately forty to fiftytimes faster.

For example, the techniques of the present invention can be used toimplement word matching for seeded alignment searching techniques otherthan biological sequence similarity searching. Furthermore, thetechniques of the present invention can also be applied to BLASTversions other than BLASTN, such as BLASTP, BLASTX, TBLASTN, andTBLASTX. Further still, the preferred embodiment has been describedwherein the starting position of each matching w-mer within either thedatabase sequence or the query sequence is determined by the variouspipeline stages. However, it should be noted that these positions neednot necessarily be the starting position. For example, the endingposition or some intermediate position can also be used to identify thepositions of the matching w-mers within the database and querysequences. These changes and modifications should be considered asalternative embodiments for the present invention, and the inventionshould be considered as limited only by the scope of the claims appendedhereto and their legal equivalents.

Thus, there has been shown and described several embodiments of a novelinvention. As is evident from the foregoing description, certain aspectsof the present invention are not limited by the particular details ofthe examples illustrated herein, and it is therefore contemplated thatother modifications and applications, or equivalents thereof, will occurto those skilled in the art. The terms “have,” “having,” “includes” and“including” and simi foregoing specification are used in the sense of“optional” or “may include” and not as “required.” Many changes,modifications, variations and other uses and applications of the presentconstruction will, however, become apparent to those skilled in the artafter considering the specification and the accompanying drawings. Allsuch changes, modifications, variations and other uses and applicationswhich do not depart from the spirit and scope of the invention aredeemed to be covered by the invention which is limited only by theclaims that follow.

1. A digital logic circuit for performing biological sequence similaritysearching, the circuit comprising: a programmable logic deviceconfigured to include a pipeline comprising a Bloom filter stage, theBloom filter stage being configured to receive a data stream comprisinga plurality of biological sequence data strings and filter thebiological sequence data stream to identify a plurality of possiblematches between the sequence data strings and a plurality of substringsof a query string.
 2. The circuit of claim 1, wherein the programmablelogic device comprises a FPGA.
 3. The circuit of claim 2, wherein thebiological sequence similarity searching pipeline comprises a firststage for BLAST.
 4. The circuit of claim 3, wherein the BLAST firststage comprises a first stage for BLASTN.
 5. The circuit of claim 3,wherein the BLAST first stage comprises a first stage for one selectedfrom the group consisting of BLASTP, BLASTX, TBLASTN, and TBLASTX. 6.The circuit of claim 3, wherein the Bloom filter stage comprises aplurality of parallel Bloom filters.
 7. The circuit of claim 6, whereinthe plurality of parallel Bloom filters are programmed with a pluralityof query substrings prior to processing the biological sequence datastream through the pipeline.
 8. The circuit of claim 7, wherein each ofthe plurality of parallel Bloom filters are programmed with the samequery substrings.
 9. The circuit of claim 8, wherein each Bloom filteris configured to access a dual port BRAM on the FPGA, wherein the dualport BRAM is double clocked to replicate a four port BRAM.
 10. Thecircuit of claim 9, wherein access to each dual port BRAM is shared byfour parallel Bloom filters.
 11. The circuit of claim 8, wherein eachBloom filter is configured to access a dual port BRAM on the FPGA,wherein the dual port BRAM is triple clocked to replicate a six portBRAM.
 12. The circuit of claim 11, wherein access to each dual port BRAMis shared by at least six parallel Bloom filters.
 13. The circuit ofclaim 8, wherein each Bloom filter is configured to utilize k hashfunctions and k (1×m) bit memory units, each memory unit correspondingto one of that Bloom filter's hash functions, each hash function beingconfigured to map received sequence strings to bits in its correspondingmemory unit to determine if a possible match exists.
 14. The circuit ofclaim 13, wherein each memory unit comprises a dual port BRAM unit. 15.The circuit of claim 13, wherein access to each dual port BRAM unit isshared by a plurality of the Bloom filters.
 16. The circuit of claim 15,wherein each dual port BRAM unit is double clocked such that it isshared by at least four of the Bloom filters.
 17. The circuit of claim7, wherein the pipeline further comprises a hashing stage downstreamfrom the Bloom filter stage, the hashing stage being configured to mapthe possible matches to at least one position within the query string.18. The circuit of claim 17, further comprising a memory incommunication with the hashing stage, the memory being configured tostore a hash table, the hash table storing at least one pointer to aposition in the query string for a possible match.
 19. The circuit ofclaim 17, wherein the hashing stage comprises a perfect hashing stage.20. The circuit of claim 17, wherein the hashing stage comprises a nearperfect hashing stage.
 21. The circuit of claim 20, wherein the nearperfect hashing stage is configured with functions A and B selected fromthe H3 family of hash functions, wherein A and B are guaranteed to be offull rank.
 22. The circuit of claim 17, further comprising a redundancyfilter downstream from the hashing stage, the redundancy filter beingconfigured to receive data representative of possible matches from thehashing stage and remove possible matches that are redundant withprevious possible matches within a selectable degree of redundancy. 23.The circuit of claim 22, wherein the redundancy filter is implemented onthe FPGA.
 24. The circuit of claim 3, further comprising a database incommunication with the FPGA, the database being configured to store abiological sequence that is to be streamed through the FPGA.
 25. Asystem for performing seeded alignment searching, the system comprising:a programmable logic device configured to implement a seeded alignmentsearching pipeline, the pipeline comprising a Bloom filter stage, theBloom filter stage being configured to receive a data stream, the datastream comprising a plurality of data strings and filter the data streamto identify a plurality of possible matches between the stream datastrings and a plurality of substrings of a query string.
 26. The systemof claim 25, wherein the programmable logic device comprises a fieldprogrammable gate array (FPGA).
 27. A method for performing biologicalsequence similarity searching, the method comprising: processing abiological sequence data stream through a Bloom filter stage implementedon a programmable logic device, the biological sequence data streamcomprising a plurality of biological sequence data strings, the Bloomfilter stage comprising a Bloom filter that is programmed with aplurality of substrings of a query string and configured to identify aplurality of possible matches between the biological sequence datastrings and the query substrings.
 28. The method of claim 27, whereinthe Bloom filter is further configured to identify a position of eachpossible match within the biological sequence data stream.
 29. Themethod of claim 28, wherein the programmable logic device comprises aFPGA.
 30. The method of claim 29, further comprising performing theprocessing step as a portion of a first stage for BLAST.
 31. The methodof claim 30, wherein the BLAST first stage comprises a first stage forBLASTN.
 32. The method of claim 30, wherein the BLAST first stagecomprises a first stage for one selected from the group consisting ofBLASTP, BLASTX, TBLASTN, and TBLASTX.
 33. The method of claim 30,wherein the Bloom filter stage comprises a plurality of parallel Bloomfilters.
 34. The method of claim 33, further comprising: programming theparallel Bloom filters with a plurality of query substrings prior toprocessing the biological sequence data stream through the Bloom filterstage.
 35. The method of claim 33, wherein the programming stepcomprises programming each of the parallel Bloom filters with the samequery substrings.
 36. The method of claim 35, wherein the Bloom filterstage comprises a plurality of dual port BRAM units, the method furthercomprising double clocking the BRAM units to replicate a four port BRAMunits.
 37. The method of claim 36, wherein each dual port BRAM unit isaccessed by four parallel Bloom filters.
 38. The method of claim 35,wherein the Bloom filter stage comprises a plurality of dual port BRAMunits, the method further comprising triple clocking the BRAM units toreplicate a six port BRAM units.
 39. The method of claim 38, whereineach dual port BRAM unit is accessed by six parallel Bloom filters. 40.The method of claim 34, further comprising mapping, via a hashing stage,each possible match from the Bloom filter stage to a position in thequery string corresponding to the query substring that caused thepossible match.
 41. The method of claim 40, wherein the mapping step isperformed at least partially by the FPGA.
 42. The method of claim 41,wherein the mapping step comprises performing the mapping via a perfecthashing stage.
 43. The method of claim 41, wherein the mapping stepcomprises performing the mapping via a near perfect hashing stage. 44.The method of claim 43, wherein the near perfect hashing stage isconfigured with functions A and B selected from the H3 family of hashfunctions, wherein A and B are guaranteed to be of full rank.
 45. Themethod of claim 40, further comprising removing possible matches thatare redundant with other possible matches within a selectable degree ofredundancy.
 46. The method of claim 45, wherein the removing step isperformed at least partially by the FPGA.
 47. A system for generating ahash table for use in mapping a set of strings to keys, the systemcomprising: a processor configured to provide near perfect hashing onthe set of strings to identify a location in a hash table, utilizing anH3 family of hash functions, corresponding to each string and togenerate the hash table, the hash table being configured to store, ateach identified location therein is a key; and a memory in communicationwith the processor for storing the hash table, wherein the processor isconfigured to provide near perfect hashing via hash functions, utilizingthe H3 family of hash functions, that are guaranteed to be of full rank.48. A system for generating a hash table for use in mapping a set ofsubstrings to a position in a larger string, the system comprising: aprocessor configured to provide near perfect hashing on the set ofsubstrings to identify a position in a hash table, corresponding to eachsubstring and to generate the hash table, the hash table beingconfigured to store, at each identified location therein, a pointer to aposition of each substring in the larger string; and a memory incommunication with the processor for storing the hash table.
 49. Adigital logic circuit for performing biological sequence similaritysearching, the circuit comprising: a programmable logic deviceconfigured to include a pipeline that comprises a matching stage, thematching stage being configured to receive a data stream comprising aplurality of possible matches between a plurality of biological sequencedata strings and a plurality of substrings of a query string and thepipeline further comprises a prefilter stage located downstream from thematching stage, the ungapped extension prefilter stage being configuredto shift through pattern matches between the biological sequence datastrings and the plurality of substrings of a query string and provide ascore so that only pattern matches that exceed a user defined score willpass downstream from the ungapped extension prefilter stage.
 50. Thecircuit of claim 49, wherein the programmable logic device comprises anFPGA.
 51. The circuit of claim 49, wherein the matching stage comprisesat least a portion of first stage for BLAST and the ungapped extensionprefilter stage comprises at least a portion of a second stage forBLAST.
 52. The circuit of claim 51, wherein the BLAST first stage isselected from the group consisting of a first stage for BLASTN, BLASTP,BLASTX, TBLASTN, and TBLASTX and the BLAST second stage is selected fromthe group consisting of a second stage for BLASTN, BLASTP, BLASTX,TBLASTN, and TBLASTX.
 53. The circuit of claim 49, wherein the wordmatching stage further comprises a Bloom filter stage.
 54. The circuitof claim 53, wherein the Bloom filter stage includes a plurality ofBloom filters.
 55. The circuit of claim 53, wherein the word matchingstage further comprises a hashing stage downstream from the Bloom filterstage, the hashing stage being configured to map the possible matches toat least one position within the query string.
 56. The circuit of claim55, further comprising a redundancy filter downstream from the hashingstage, the redundancy filter being configured to receive datarepresentative of possible matches from the hashing stage and removepossible matches that are redundant with previous possible matcheswithin a selectable degree of redundancy.
 57. The circuit of claim 49,wherein the ungapped extension prefilter stage comprises an extensioncontroller, a window lookup module and a scoring module.
 58. The circuitof claim 49, wherein the ungapped extension prefilter stage utilizesparameters selected from the group consisting of a matching score, amismatching score and a cutoff threshold.
 59. The circuit of claim 49,wherein the ungapped extension prefilter stage comprises an extensioncontroller.
 60. The circuit of claim 59, wherein the extensioncontroller comprises at least one of hardware, firmware and software.61. The circuit of claim 59, wherein the extension controller comprisesa plurality of data inputs including at least one first input forplurality of possible pattern matches of a predetermined length betweena plurality of biological sequence data strings and a plurality ofsubstrings of a query string and at least one second input forbiological sequence data.
 62. The circuit of claim 61, wherein at leastone second input can receive commands for controlling parametersselected from the group consisting of a match score, a mismatch score ora cutoff threshold.
 63. The circuit of claim 57, wherein the extensioncontroller parses the biological data received by the at least onesecond input to demultiplex a shared command and the pattern matches ofa predetermined length prior to submission into the window lookupmodule.
 64. The circuit of claim 49, wherein the ungapped extensionprefilter stage comprises a window lookup module.
 65. The circuit ofclaim 64, wherein the window lookup module can fetch at least onesubstring of the biological sequence data and at least one substring ofa query string to form a predefined window of biological data that canbe utilized for analysis.
 66. The circuit of claim 64, wherein thewindow lookup module receives at least one substring of the biologicalsequence data in a database buffer, at least one substring of a querystring in a query buffer, an offset for at least one substring of thebiological sequence data and an offset for at least one substring of aquery string, wherein the window lookup module calculates a beginning ofa window that is passed to a plurality of buffer modules, wherein theplurality of buffer modules move the at least one substring of a querystring and the at least one substring of the biological sequence data toan output stage to create at least one predefined window of biologicaldata that can be utilized for analysis.
 67. The circuit of claim 65,wherein the at least one substring of the biological sequence data andat least one substring of a query string are buffered on a plurality ofBRAMS.
 68. The circuit of claim 49, wherein the ungapped extensionprefilter stage comprises a scoring module.
 69. The circuit of claim 68,wherein the scoring module is capable of fetching biological data withina predefined window.
 70. The circuit of claim 69, wherein the scoringmodule utilizes at least one iteration of algorithm instructions. 71.The circuit of claim 68, wherein the scoring module further comprises abase comparator, a plurality of scoring stages and a thresholdcomparator.
 72. The circuit of claim 68, wherein the scoring modulefurther comprises a base comparator.
 73. The circuit of claim 72,wherein the base comparator receives at least one pair of a substring ofthe biological sequence data and a substring of a query string andcompares these values and obtains at least one comparison score.
 74. Thecircuit of claim 73, wherein the at least one comparison score includesa positive value for each matching pair and a negative score for eachmismatching pair.
 75. The circuit of claim 73, wherein the at least onecomparison score is substituted from values located on a lookup table.76. The circuit of claim 68, wherein the scoring module furthercomprises a plurality of scoring stages.
 77. The circuit of claim 76,further comprising discarding at least one comparison score from eachsubsequent downstream scoring stage of the plurality of scoring stages.78. The circuit of claim 76, wherein at least one scoring stage of theplurality of scoring stages receives input selected from the group ofcomparisons of biological sequence data in addition to biologicalsequence database and query positions within a predetermined window ofdata and at least one scoring stage of the plurality of scoring stagesreceives input from at least one other scoring stage of the plurality ofscoring stages, wherein the input is selected from the group consistingof scores of high-scoring segment pairs and endpoints of high-scoringsegment pairs.
 79. The circuit of claim 68, wherein the scoring modulefurther comprises a threshold comparator.
 80. The circuit of claim 79,wherein the threshold comparator obtains a pattern match of apredetermined length that includes at least one score and determines ifthe at least one score exceeds a predetermined threshold value andpasses the pattern match downstream.
 81. The circuit of claim 79,wherein the threshold comparator obtains a pattern match of apredetermined length and determines if a high scoring substringintersects a boundary of a predetermined window of data.
 82. A methodfor performing biological sequence similarity searching, the methodcomprising: processing a biological sequence data stream with aprogrammable logic device configured to include a pipeline thatcomprises a matching stage, the matching stage being configured toreceive a data stream comprising a plurality of possible matches betweena plurality of biological sequence data strings and a plurality ofsubstrings of a query string; and utilizing a ungapped extensionprefilter stage located downstream from the matching stage in thepipeline, the ungapped extension prefilter stage being configured toshift through pattern matches between the biological sequence datastrings and the plurality of substrings of a query string and provide ascore so that only pattern matches that exceed a user defined score willpass downstream from the ungapped extension prefilter stage.
 83. Themethod of claim 82, wherein the programmable logic device comprises anFPGA.
 84. The method of claim 82, wherein the matching stage includesutilizing at least a portion of a first stage for BLAST and the ungappedextension prefilter stage includes utilizing at least a portion of asecond stage for BLAST.
 85. The method of claim 84, wherein the BLASTfirst stage is selected from the group consisting of a first stage forBLASTN, BLASTP, BLASTX, TBLASTN, and TBLASTX and the BLAST second stageis selected from the group consisting of a second stage for BLASTN,BLASTP, BLASTX, TBLASTN, and TBLASTX.
 86. The method of claim 82,comprises utilizing a Bloom filter stage in the matching stage of thepipeline.
 87. The method of claim 86, wherein the Bloom filter stageincludes a plurality of Bloom filters.
 88. The method of claim 86,further comprising utilizing a hashing stage downstream from the Bloomfilter stage and mapping the possible matches to at least one positionwithin the query string with the hashing stage.
 89. The method of claim88, further comprising utilizing a redundancy filter downstream from thehashing stage, with the redundancy filter being configured to receivedata representative of possible matches from the hashing stage andfurther comprising removing possible matches that are redundant withprevious possible matches within a selectable degree of redundancy. 90.The method of claim 82, wherein the ungapped extension prefilter stagecomprises an extension controller, a window lookup module and a scoringmodule.
 91. The method of claim 82, further comprising utilizingparameters selected from the group consisting of a match score, amismatch score or a cutoff threshold with the ungapped extensionprefilter stage.
 92. The method of claim 82, wherein the ungappedextension prefilter stage comprises an extension controller.
 93. Themethod of claim 92, wherein the extension controller comprises at leastone of hardware, firmware and software.
 94. The method of claim 92,further comprising receiving a plurality of possible pattern matches ofa predetermined length between a plurality of biological sequence datastrings and a plurality of substrings of a query string as at least onefirst input and receiving biological sequence data with at least onesecond input.
 95. The method of claim 94, further comprising receivingcommands for controlling parameters selected from the group consistingof a match score, a mismatch score or a cutoff threshold with the atleast one second input.
 96. The method of claim 90, further comprisingparsing the data received by the at least one second input todemultiplex the shared command and the pattern matches of apredetermined length prior to submission into the window lookup module.97. The method of claim 82, wherein the ungapped extension prefilterstage comprises a window lookup module.
 98. The method of claim 97,further comprising fetching at least one substring of the biologicalsequence data and at least one substring of a query string to form apredefined window of biological data for analysis with the window lookupmodule.
 99. The method of claim 97, further comprising: receiving atleast one substring of the biological sequence data in a database bufferwith the window lookup module; receiving at least one substring of aquery string in a query buffer with the window lookup module; receivingan offset for the at least one substring of the biological sequence dataand an offset for the at least one substring of a query string that isutilized to calculate a beginning of a window; moving the offset for theat least one substring of the biological sequence data and the offsetfor at least one substring of a query string with a plurality of buffermodules; and creating at least one predefined window of biological datafor analysis based on the at least one substring of the biologicalsequence data, the at least one substring of a query string, the offsetfor the at least one substring of the biological sequence data and theoffset for at least one substring of a query string.
 100. The method ofclaim 98, further comprising buffering the at least one substring of thebiological sequence data and at least one substring of a query string ona plurality of BRAMS.
 101. The method of claim 82, wherein the ungappedextension prefilter stage comprises a scoring module.
 102. The method ofclaim 101, further comprising fetching biological data within apredefined window with the scoring module.
 103. The method of claim 101,further utilizing at least one iteration of algorithm instructions withthe scoring module.
 104. The method of claim 101, wherein the scoringmodule further comprises a base comparator, a plurality of scoringstages and a threshold comparator.
 105. The method of claim 101, whereinthe scoring module further comprises a base comparator.
 106. The methodof claim 105, further comprising receiving at least one pair of asubstring of the biological sequence data and a substring of a querystring and comparing these values and obtaining at least one comparisonscore with the base comparator.
 107. The method of claim 106, whereinthe at least one comparison score includes a positive value for eachmatching pair and a negative score for each mismatching pair.
 108. Themethod of claim 106, further comprising substituting the at least onecomparison score from values located on a lookup table.
 109. The methodof claim 101, wherein the scoring module further comprises a pluralityof scoring stages.
 110. The method of claim 109, further comprisingdiscarding at least one comparison score for each stage of the pluralityof scoring stages.
 111. The method of claim 109, further comprisingreceiving input selected from the group of comparisons of biologicalsequence data and biological sequence database and query positionswithin a predetermined window of data with at least one scoring stage ofthe plurality of scoring stages and further comprising receiving inputis selected from the group consisting of scores of high-scoring segmentpairs and endpoints of high-scoring segment pairs from at least oneother scoring stage of the plurality of scoring stages.
 112. The methodof claim 101, wherein the scoring module further comprises a thresholdcomparator.
 113. The method of claim 112, further comprising obtaining apattern match of a predetermined length that includes at least one scoreand determines if the at least one score exceeds a predeterminedthreshold value with the threshold comparator and passing the patternmatch downstream.
 114. The method of claim 112, further comprisingobtaining a pattern match of a predetermined length and determines if ahigh scoring substring intersects a boundary of a predetermined windowof data with the threshold comparator.